PMC text mining subset in BioC: about three million full-text articles and growing

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe th...

متن کامل

Extended dependency graph for BioC-compatible protein-protein interaction (PPI) passage detection in full-text articles

Protein-protein interaction (PPI) is important in the field of experimental biology as well as bioinformatics. In BioCreative V, we participated in the BioC task and developed a PPI system to detect passages with PPIs in the full-text articles. By adopting the BioC format, the output of the system could be seamlessly added to the biocuration tool with little effort required for the system integ...

متن کامل

tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we ...

متن کامل

The BioC-BioGRID corpus: full text articles annotated for curation of protein–protein and genetic interactions

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provid...

متن کامل

Identification of Important Text in Full Text Articles Using Summarization

Other research has shown that although the abstract is more information dense, the full text of a scientific article in the biomedical domain has much greater information content.1 We know from observing indexers and studying their indexing process that some of the assigned MeSH concepts do not appear in the abstract. The indexing manual also dictates that the abstract should not be used during...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bioinformatics

سال: 2019

ISSN: 1367-4803,1460-2059

DOI: 10.1093/bioinformatics/btz070